Continual Learning for Monolingual End-to-End Automatic Speech Recognition. (arXiv:2112.09427v2 [eess.AS] UPDATED)
Adapting Automatic Speech Recognition (ASR) models to new domains leads to a
deterioration of performance on the original domain(s), a phenomenon called
Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to
new accents, dialects, topics, etc. without suffering from CF, making them
unable to be continually enhanced without storing all past data. Fortunately,
Continual Learning (CL) methods, which aim to enable continual adaptation while
overcoming CF, can be used. In this paper, we implement an extensive number of
CL methods for End-to-End ASR and test and compare their ability to extend a
monolingual Hybrid CTC-Transformer model across four new tasks. We find that
the best performing CL method closes the gap between the fine-tuned model
(lower bound) and the model trained jointly on all tasks (upper bound) by more
than 40%, while requiring access to only 0.6% of the original data.